Low Latency Index Maintenance in Indri
نویسندگان
چکیده
There has been a resurgence of interest in index maintenance (or incremental indexing) in the academic community in the last three years. Most of this work focuses on how to build indexes as quickly as possible, given the need to run queries during the build process. This work is based on a different set of assumptions than previous work. First, we focus on latency instead of throughput. We focus on reducing index latency (the amount of time between when a new document is available to be indexed and when it is available to be queried) and query latency (the amount of time that an incoming query must wait because of index processing). Additionally, we assume that users are unwilling to tune parameters to make the system more efficient. We show how this set of assumptions has driven the development of the Indri index maintenance strategy, and describe the details of our implementation.
منابع مشابه
OSIR 2006 Second Workshop On
There has been a resurgence of interest in index maintenance (or incremental indexing) in the academic community in the last three years. Most of this work focuses on how to build indexes as quickly as possible, given the need to run queries during the build process. This work is based on a different set of assumptions than previous work. First, we focus on latency instead of throughput. We foc...
متن کاملDUTIR at TREC 2007 Enterprise Track
This paper describes our experiments on the two tasks of the TREC 2007 Enterprise track. In data preprocessing stage we stripped the non-letter character from documents and query. For the Document Search, we built the index by indri and lemur, handled the query topic and then retrieved relevant documents by indri and lemur. For the Expert Search, we recognized candidates from collection, establ...
متن کاملDynamic Collections in Indri
Text search engines have historically been designed for unchanging collections of documents. While this is fine for many applications, a growing number of important applications in news, finance, law and desktop search require indexes that can be efficiently updated. Previous research into supporting dynamic collections revolves around incremental methods. Incremental systems are optimized for ...
متن کاملPyndri: A Python Interface to the Indri Search Engine
We introduce pyndri, a Python interface to the Indri search engine. Pyndri allows to access Indri indexes from Python at two levels: (1) dictionary and tokenized document collection, (2) evaluating queries on the index. We hope that with the release of pyndri, we will stimulate reproducible, open and fastpaced IR research.
متن کاملRMIT and Gunma University at NTCIR-9 GeoTime Task
We participated in the English English and Japanese Japanese subtasks. We selected the Indri search engine as a baseline to test our new class of indexing algorithms. English documents for Indri: Each document was converted to lowercase and written in trec sgml format. We then indexed the collection using Krovetz stemming and stopword removal. English documents for Newt: Each document was conve...
متن کامل